Search CORE

134 research outputs found

Unsupervised Data Augmentation for Less-Resourced Languages with no Standardized Spelling

Author: Fort Karën
Millour Alice
Publication venue: HAL CCSD
Publication date: 02/09/2019
Field of study

International audienceNon-standardized languages are a challenge to the construction of representative linguistic resources and to the development of efficient natural language processing tools: when spelling is not determined by a consensual norm, a multiplicity of alternative written forms can be encountered for a given word, inducing a large proportion of out-of-vocabulary words. To embrace this diversity, we propose a methodology based on crowdsourcing alternative spellings from which variation rules are automatically extracted. The rules are further used to match out-of-vocabulary words with one of their spelling variants. This virtuous process enables the unsupervised augmentation of multi-variant lexicons without requiring manual rule definition by experts. We apply this multilingual methodology on Al-satian, a French regional language and provide (i) an intrinsic evaluation of the correctness of the obtained variants pairs, (ii) an extrinsic evaluation on a downstream task: part-of-speech tagging. We show that in a low-resource scenario, collecting spelling variants for only 145 words can lead to (i) the generation of 876 additional variant pairs, (ii) a diminution of out-of-vocabulary words improving the tagging performance by 1 to 4%

Éthique et TAL : ce dont on parle, ce dont on ne parle plus, ce dont on ne parle pas (un état de l'art)

Author: Fort Karën
Publication venue: HAL CCSD
Publication date: 14/11/2022
Field of study

National audienceDepuis quelques années, l’éthique est devenue un sujet reconnu dans les domaines de l’IA et plus particulièrement dans le traitement automatique deslangues (TAL). Cette évolution récente est due à plusieurs facteurs, dont le fait que le TAL est devenu suffisamment rentable commercialement pour sortir des laboratoires de recherche et envahir nos vies quotidiennes, avec des conséquences immédiatement visibles pour le grand public. Je reviendrai dans cette présentation sur l’évolution qu’a connu le sujet sur la dernière décennie, qui a vu certaines problématiques devenir évidentes (comme la rémunération des travailleurs du clic) et ne plus être discutées, alors que d’autres (notamment les biais des modèles de langues) occupent le devant de la scène, occultant les questions les plus difficiles. Une large place sera laissée à la discussion, afin de permettre des échanges de vues sur ces sujets

INRIA a CCSD electronic archive server

Productions participatives de corpus annotés : des modèles encore incertains

Author: Fort Karën
Publication venue: HAL CCSD
Publication date: 07/11/2019
Field of study

National audienc

INRIA a CCSD electronic archive server

Extending the adverbial coverage of a French wordnet

Author: Fort Karën
Sagot Benoît
Venant Fabienne
Publication venue
Publication date: 13/05/2009
Field of study

Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and Ruth Vatvedt Fjeld. NEALT Proceedings Series, Vol. 7 (2009), 33-37. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9209

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

DSpace at Tartu University Library

Extension et couplage de ressources syntaxiques et sémantiques sur les adverbes

Author: Fort Karën
Sagot Benoît
Venant Fabienne
Publication venue: 'John Benjamins Publishing Company'
Publication date: 10/09/2008
Field of study

International audienceThis paper presents a work on extending the adverbial entries of the WOLF, a semantic lexical resource for French, and connecting them with those of the syntactic lexicon Lefff , which were mostly extracted from the lexicon-grammar tables from (Molinier & Levrier, 2000). This work relies on the exploitation of the derivation and the synonyms relations; the latter are extracted from the DicoSyn synonyms database. The resulting semantic resource, which is freely available, is manually evaluated and validated in an exhaustive manner

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

From the Ground Up: Developing a Practical Ethical Methodology for Integrating AI into Industry

Author: Anderson Marc,
Fort Karën
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2022
Field of study

International audienceIn this article we present a new approach to practical artificial intelligence (AI) ethics in heavy industry, which was developed in the context of an EU Horizons 2020 multi partner project. We begin with a review of the concept of Industry 4.0, discussing the limitations of the concept, and of iterative categorization of heavy industry generally, for a practical human centered ethical approach. We then proceed to an overview of actual and potential AI ethics approaches to heavy industry, suggesting that current approaches with their emphasis on broad high-level principles are not well suited to AI ethics for industry. From there we outline our own approach in two sections. The first suggests tailoring ethics to the time and space situation of the shop floor level worker from the ground up, including giving specific and evolving ethical recommendations. The second describes the ethicist's role as an ethical supervisor immersed in the development process and interpreting between industrial and technological (tech) development partners. In presenting our approach we draw heavily on our own experiences in applying the method in the Use Cases of our project, as examples of what can be done

INRIA a CCSD electronic archive server

HAL Descartes

Les jeux ayant un but : des sciences participatives ?

Author: Fort Karën
Guillaume Bruno
Publication venue: Paris : Ministère de la Culture et de la Communication
Publication date: 27/12/2019
Field of study

National audienceLes jeux ayant un but sont des jeux qui cachent leur but réel, la production de données. Ils s'inscrivent de ce fait dans la production participative (crowdsourcing), qui inclut aussi bien le travail parcellisé que des plateformes bénévoles, comme Wikipédia. Dans quelle mesure ces jeux peuvent être considérés comme faisant partie des sciences participatives ? quelles sont leurs spécificités

INRIA a CCSD electronic archive server

HAL Descartes

Éthique et traitement automatique des langues

Author: Amblard Maxime
Fort Karën
Publication venue: HAL CCSD
Publication date: 02/07/2018
Field of study

National audienc

INRIA a CCSD electronic archive server

Sciences participatives et diversité linguistique Retours d'expériences

Author: Fort Karën
Millour Alice
Publication venue: Paris : Ministère de la Culture et de la Communication
Publication date: 28/12/2019
Field of study

National audienceCertaines langues pâtissent d’un manque de ressources au sens large, qu’elles soient humaines,linguistiques ou financières, en particulier pour produire les outils de traitement automatiquenécessaires à leur intégration numérique. Pour ces langues, dites « peu dotées », la productionparticipative apparaît comme un moyen prometteur de mettre à profit la présence croissante delocuteurs sur Internet

INRIA a CCSD electronic archive server

Ethical Internal Logistics 4.0: Observations and Suggestions from a Working Internal Logistics Case

Author: Anderson Marc,
Fort Karën
Publication venue: HAL CCSD
Publication date: 22/09/2022
Field of study

International audienceIn this paper we present our experiences and insights from a Use Case in heavy industry, where OCR text recognition is combined with algorithms to correctly identify labels for additives to be introduced into a production process. Ethical issues are presented relative to the effects of the Use Case upon the shop floor operators using the new technology. We then discuss recommendations given and our success in getting them implemented. An argument follows, regarding what we view as the source of many of the ethical issues: the unreflective acceptance of Industry 4.0 and Internal Logistics 4.0 as a generalized and idealized 'plan' which technological development and the human operator have to adapt to. We contrast this to an approach where the needs of the human in the work context would drive and limit internal logistics 4.0 development as a set of gradual improvements tailored to the worker's situation

INRIA a CCSD electronic archive server